428 research outputs found

    Dictionary matching in a stream

    Get PDF
    We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary

    Efficient comparison based string matching

    Get PDF

    Online Detection of Repetitions with Backtracking

    Full text link
    In this paper we present two algorithms for the following problem: given a string and a rational e>1e > 1, detect in the online fashion the earliest occurrence of a repetition of exponent e\ge e in the string. 1. The first algorithm supports the backtrack operation removing the last letter of the input string. This solution runs in O(nlogm)O(n\log m) time and O(m)O(m) space, where mm is the maximal length of a string generated during the execution of a given sequence of nn read and backtrack operations. 2. The second algorithm works in O(nlogσ)O(n\log\sigma) time and O(n)O(n) space, where nn is the length of the input string and σ\sigma is the number of distinct letters. This algorithm is relatively simple and requires much less memory than the previously known solution with the same working time and space. a string generated during the execution of a given sequence of nn read and backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201

    Ligand-induced formation of nucleic acid triple helices.

    Full text link

    A Very Large Array Search for 5 GHz Radio Transients and Variables at Low Galactic Latitudes

    Get PDF
    We present the results of a 5 GHz survey with the Very Large Array (VLA) and the expanded VLA, designed to search for short-lived (≾1 day) transients and to characterize the variability of radio sources at milli-Jansky levels. A total sky area of 2.66 deg^2, spread over 141 fields at low Galactic latitudes (b≅6-8 deg), was observed 16 times with a cadence that was chosen to sample timescales of days, months, and years. Most of the data were reduced, analyzed, and searched for transients in near real-time. Interesting candidates were followed up using visible light telescopes (typical delays of 1-2 hr) and the X-ray Telescope on board the Swift satellite. The final processing of the data revealed a single possible transient with a peak flux density of f_ν≅2.4 mJy. This implies a transient's sky surface density of κ(f_ν > 1.8 mJy) = 0.039^(+0.13 +0.18)_(–0.032,–0.038) deg^(–2) (1σ, 2σ confidence errors). This areal density is roughly consistent with the sky surface density of transients from the Bower et al. survey extrapolated to 1.8 mJy. Our observed transient areal density is consistent with a neutron star's origin for these events. Furthermore, we use the data to measure the source variability on timescales of days to years, and we present the variability structure function of 5 GHz sources. The mean structure function shows a fast increase on ≈1 day timescale, followed by a slower increase on timescales of up to 10 days. On timescales between 10 and 60 days, the structure function is roughly constant. We find that ≳30% of the unresolved sources brighter than 1.8 mJy are variables at the >4σ confidence level, presumably mainly due to refractive scintillation

    Fast Algorithm for Partial Covers in Words

    Get PDF
    A factor uu of a word ww is a cover of ww if every position in ww lies within some occurrence of uu in ww. A word ww covered by uu thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of uu. In this article we introduce a new notion of α\alpha-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least α\alpha positions in ww. We develop a data structure of O(n)O(n) size (where n=wn=|w|) that can be constructed in O(nlogn)O(n\log n) time which we apply to compute all shortest α\alpha-partial covers for a given α\alpha. We also employ it for an O(nlogn)O(n\log n)-time algorithm computing a shortest α\alpha-partial cover for each α=1,2,,n\alpha=1,2,\ldots,n

    Efficient Seeds Computation Revisited

    Get PDF
    The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in O(n2)O(n^2) time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an O(nlog(n/m))O(n\log{(n/m)}) time algorithm checking if the shortest seed has length at least mm and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

    Covering Problems for Partial Words and for Indeterminate Strings

    Full text link
    We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to kk, the number of non-solid symbols. For the indeterminate string covering problem we obtain a 2O(klogk)+nkO(1)2^{O(k \log k)} + n k^{O(1)}-time algorithm. For the partial word covering problem we obtain a 2O(klogk)+nkO(1)2^{O(\sqrt{k}\log k)} + nk^{O(1)}-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no 2o(k)nO(1)2^{o(\sqrt{k})} n^{O(1)}-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

    Palindromic Decompositions with Gaps and Errors

    Full text link
    Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time O(n log n * g) and space O(n g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \delta-palindromes (i.e. palindromes with \delta errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201
    corecore